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Information Theory and Quadrature Rules 

James S. Wolper 



Abstract — Quadrature rules estimate J^^ f{x) dx when / is 
defined by a table of n + 1 values. Every binary string of length 
n defines a quadrature rule by choosing which endpoint of 
each interval represents the interval. The standard rules, such 
as Simpson's Rule, correspond to strings of low Kolmogorov 
complexity, making it possible to define new quadrature rules 
with no smoothness assumptions, as well as in higher dimensions. 
Error results depend on concepts from compressed sensing. Good 
quadrature rules exist for "sparse" functions, which also satisfy 
an error-information duality principle. 

I. Introduction 

RESEARCHERS have been showing how information 
theory clarifies resuhs about mathematics and computing 
ever since Shannon f5l defined the basic concepts. This work 
considers quadrature or numerical integration from an infor- 
mation theory perspective. The basic problem is to estimate 
/(, f{x) dx from a table of values fi = f{^), i = 0, . . . ,n. 
This kind of problem arises naturally in applications, where, 
for example, one may only be able to estimate the value of a 
function during a satellite pass, or at a discrete set of ambient 
conditions such as temperature, or, in the social sciences, on 
Tuesdays. 

Standard works on numerical analysis (eg, [Tj, f4\) develop 
quadrature methods that require one of two conditions that 
are impossible to guarantee. Many methods (eg, Gaussian 
quadrature) require evaluation of / at arbitrary points in its 
domain, which is impossible in the situation at hand. Other 
methods (eg Newton-Cotes integration; see below) impose 
smoothness conditions on /. This, too, is problematic: imagine 
the effect of earthquake, phase transition, or scandal on the 
functions whose measurement is described above. 

Estimation without control over the error is unsatisfying. 
Integrating a function from a table is a kind of signal process- 
ing, and ideas from signal reconstruction lead to two error 
estimates, at least for functions that have a sparse (although 
perhaps unknown) representation. The first is a kind of error- 
information duality for integration; briefly, the information 
in the error is the error in the information. The second is 
the existence of good quadrature rules for sparse functions. 
Section V has details, including the definition of sparse. 

Here is the outline. Section II defines a primitive quadrature 
rule for estimating J f{x) dx from any binary string of length 
n. A quadrature program for J f{x) dx is the mean of the 
estimates from several primitive quadrature rules. 

Section III develops 

Theorem 1: Each Newton-Cotes estimate for J f{x) dx 
corresponds to a quadrature program based on strings of low 
Kolmogorov complexity. 
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Basing these rules on computational complexity rather than 
smoothness extends them to the case when there is no smooth- 
ness assumption on /. 

Section IV discusses quadrature over higher dimensional 
domains. The interpolation technique behind Newton-Cotes 
no longer applies, but strings of low Kolmogorov complexity 
define new quadrature rules. 

Section V shows how concepts from Signal Reconstruction 
or Compressed Sensing (O) provide information about the 
error terms, at least for "sparse" functions. 

Section VI speculates on further applications of these ideas. 

II. Quadrature Programs 

The Riemann integral f{x) dx depends on having full 
information about /. (By scaling and translation, restricting to 
integrals over the domain [0, 1] causes no loss of generality.) 
Briefly, the domain is subdivided, and / is sampled in each 
subdomain. One then takes the limit, as the mesh goes to zero, 
of the sums f{x*)lS.Xi, where x* is the sample point and 
Axi is the size of the corresponding subinterval. 

Sampling and, therefore, computing the limit is not feasible 
when / is known by a table of values fi = f{^), i = 0, ■ ■ ■ ,n. 
In this case, one typically chooses one of the endpoints of each 
interval as the sample point. 

For convenience, let ft, — 1/n denote the size of each 
subinterval. 

Definition 1: A primitive quadrature rule from the binary 
string b for / is the sum /i/i + ••• + /« ft, where 

ij fi-i if = 

•'^ \ U ifb, = 1. 

In other words, the binary string b is an input to the pseu- 
docode program below. 

float Quadrature ( float f, bool b[], int n, 
float h) { 
int i = 0; 
float q = 0.0; 
for (1 = 0; 1 < n; i++) { 
If (b[i] = 0) { 

q += h*f[i]; // left endpoint 

} 

else { 

q += h*f[i+l]; // right endpoint 

} 

} 

return q; 

} 
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Definition 2: A quadrature rule is the estimate obtained 
from taking the mean of the estimates from a finite set of 
primitive quadrature rules. 



1 1 0. To confirm, the 000 rule yields yo+yi +y2\ the 111 rule 
yields 7/1+2/2+2/3; the 100 rule yields 2/1+7/1+1/2; and the 110 
rule yields 2/i+j/2+y2- These add up to 3j/o + 92/i+9j/2+32/3; 
divide by 8 to get the mean, and factor out the 3. 



A. Example 

Suppose that n = 7, so / is defined by /o, . 
string b = 0011101 yields the estimate 

{k + .h + .h + h + h + h + h)h 

while b = 0001111 yields 

(/o + /i + /2 + h + h + k + h)h. 



B. Strings of Low Complexity 

The binary strings of lowest complexity are • • ■ and 
111--1. These correspond to using the left and the right 
endpoints of each interval, respectively. In the first case, 
though, the final value /„ has no effect on the estimate of the 
integral, while in the second the initial value /o is ignored. A 
remedy to this situation is to take the mean of the two estimates 
so obtained. A simple calculation shows that the estimate is 
then 



., /y. The B. n = 4 



/o + 2/1 + 



2/„-i + /„ 



2n 

which is the well-known trapezoid rule. 

Proposition 1: The trapezoid rule is the mean of the quadra- 
ture rules • • • and 1 • • • 1 . 

Alternatively, the trapezoid rule is the mean of the quadra- 
ture rules 0101- ■ • 01 and 1010- • ■ 10. 

The next most complex strings are 1 1 • • • 1 and 
1010 - •• 10. Simpson's Rule (|1)) estimates the integral as 

/o + 4/1 + 2/2 + ■ ■ • + 4/„_i + /„ 
3n 

Proposition 2: Simpson's Rule is the mean of the quadra- 
ture rules 0- • • 0, 1- • • 1, and 1010- ■• 10. 

III. Comparison with Newton-Cotes Quadrature 

More generally, Newton-Cotes Integration uses the La- 
grange interpolation polynomial of degree n to derive an 
approximation that is exact when / has degree < n; see 
lHJ. Here are some common Newton-Coles formulas, along 
with their interpretations as quadrature programs. Notice that 
in each case the complexity of the strings involved is quite 
low. Also notice that in each case the estimate has the form 

A. n = 3, or Simpson 's Three-Eights Rule 

In this case, the function is sampled at four equally-spaced 
points (xcj/o), (a;i,2/i), {x2,y2), and (a;3,y3). 



[ f{x) dxK.'^ 



yo + 3yi + 3y2 + 2/3 



In this case, the function is sampled at five equally-spaced 
points {xo,yo), {xi,yi), {x2,y2), (2^3, 2/3), and ix4,y4), and 



f{x) dx 



2h 

45 



7yo + 32?/i + 12j/2 + 32i/3 + 7y4 



This estimate is the mean of eight primitive quadrature rules: 
three from 0, three from 111, one from 100, and one from 



This corresponds to the mean of 45 primitive quadrature 
rules: 12 each from 0000 and 1111, two from 0011, and 
19 instances from 1010. 

This is four of the samples of Simpson's Rule, plus seven 
more 1010s and two more 0011. The latter choice of 
endpoints concentrate on the center of the table, while the 
former concentrates on the alternate endpoints. 

At this point the proof of Theorem [T] is clear. 

IV. Higher Dimensions 

The one-dimensional Newton-Cotes methods use an inter- 
polating polynomial of degree d. One needs d + 1 distinct 
points to determine the coefficients of this polynomial. This 
is easy when the domain is an interval. 

The situation is different in higher dimensions. The dimen- 
sion of the vector space of polynomials of degree at most d 
in n variables is 

d + n 
n 

this is the number of coefficients, or, since passing through a 
given point imposes one linear constraint on the polynomial, 
the number of points required to determine the coefficients 
uniquely. 

The number of points in a cubic grid is 2"^, but adjoining 
adjacent cubes leads to other grid point counts. The difficulty 
is matching the number of grid points to the number of 
coefficients. As a rule, this is impossible. 

Any sequences of digits modulo 2" — 1 still determines a 
primitive quadrature rule. Look, for example, at an m x m 
array in dimension 2, which is made up from primitive 
(ie, 2x2) squares. Arbitrarily label the corners of each square 
0, 1, 2, and 3, for example starting at the northwest corner 
and proceeding clockwise. 

Now, consider the mean of the four low-complexity se- 
quences 000... 0, 111..., 222..., and 333.... (Each has 
length m^.) In each primitive square, the corresponding entry 
in the sequence determines which grid point to choose. 

The result is a quadrature rule that weights each of the 
corner points with weight 1, each of the non-corner edge 
points with weight 2, and each of the interior points with 
weight 4; the weighted sum is then divided by 4. 

Theorem 2: The mean of the four low-complexity se- 
quences 000... 0, 111..., 222..., and 333... defines a 
quadrature rule with weights 
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1 2 ••■ 2 1 

2 4 ••• 4 2 

2 4 ••• 4 2 
1 2 ••• 2 1. 

V. Error Results 

So far, there has been no mention of an error estimate of 
the kind associated with the Newton-Cotes rules. Notice that 
all of the Newton-Cotes rules take as input the same table 
of values {/, }, but higher order estimates give a tighter error 
bound. While there is no more information to be gained from 
the the table of values, the extra smoothness assumptions of 
the higher-order methods seem to provide more information 
about the function itself. 

The error of one of these estimates is zero when the sampled 
function is a polynomial of sufficiently low degree. When 
the function is a polynomial of low degree then the table of 
values can contain no more information than the ordered set 
of coefficients. The heuristic is that more information about a 
function enables a tighter error estimate. 

Recently, Donoho ([3|) and others have investigated the 
problem of Compressed Sensing (CS), which is to reconstruct 
a signal represented as a vector from a sample of its entries. 
Donoho showed that knowing that a vector can be compressed 
is enough to reconstruct it, even without knowing what the 
compressed version might have been. When integrating the 
goal is to process the signal rather than to reconstruct it, but 
the same principle applies. 

This section contains two results. The first, following 
Donoho (1 3 1), relates the information in the error in an integral 
estimate to the error in the information in the description 
of the integrand, a kind of Error-Information Duality. The 
second proves that for sparse functions (see below) there 
exists a quadrature program estimating the integrl to arbitrary 
precision. 

A. Error-Information Duality 

Following Donoho, the functions of interest have the form 
f{x) — ^a-jiijix), where the functions 0j form a basis for 
an appropriate space of functions. (The space for which they 
form a basis is intentionally left vague in order to be as general 
as possible.) The function is sparse if for some R > 0, 

\\a\\p < R, 

where < p < 2 and ||a||p is the P norm of the series of 
coefficients ai, 02, . . .. A function whose expansion has many 
small terms fails to be sparse by this definition, while a finite 
degree polynomial expansion is sparse. 

Let Xp,n (R) denote the space of functions given by a table 
of n values which are F sparse in the sense above. This is the 
space of functions of interest. 

Begin with the functions / with d+1 nonzero coefficients, 
generalizing the space of polynomials if degree < d. Renum- 
ber if necessary so that the nonzero coefficients are ao, . . . , a^. 

The entries in the table of values f = [/o • • • fnV' ^i"^ 
'^j=o'^j'f'jin)- Let $ denote the {n + I) x d matrix with 



entries (f>j{^). Let a = [oq • • • jfljj]'^. Then f = ^a. The 
matrix $ only depends on the basis 

Now, integrate /. First, let qj = fj{x) dx, and let 
Q = [qo, . . . , qd]'^', like $, Q only depend on the basis. Since 

fix) = Eo«j'^j(^)' lo fix) dx = Eo = Q*^- 

Next, suppose that $ has a left inverse <i>^^, noting that this 
is never the case when n + 1 < d. Then a = $^^f, and 

Theorem 3: When the expansion of / has d+1 coefficients 
and n >d + 1 then f{x) dx = Q^^^f. ■ 

Compare this theorem with the exactness results for 
Newton-Cotes integrals of degree d polynomials. 

When there are more than n nonzero coefficients, the 
integral can be estimated by truncating the series expansion 
to include the n "most important" coefficients. The truncated 
function is integrated exactly, so the error in the estimate 
comes from the coefficients that were ignored. The truncated 
function contains n floats worth of information, plus a 
little more to describe where these coefficients are in the 
series expansion. The table of values has n floats worth of 
information as well. The information in the error in the integral 
estimate is exactly the information in the ignored coefficients. 
Hence 

Theorem 4 (Error-Information Duality): The information 
content of the error is (a digest of) the error in the known 
information about the integrand. ■ 

B. Good Quadrature Rules 

Now suppose that / is a sparse function in the sense of the 
section above, so that there exists a good estimate 

i\) ll fix) dx = ELofl^/i- 
This section shows 

Theorem 5: For any e > there exists a quadrature 
program that approximates (f) within e. 

Proof. Choose rational numbers yi/r such that sup{|ai — 
yi/r\} < e/n. Here r is any convenient common denominator. 
Notice that E — because of the weighted average nature 
of (t). The proof finds quadrature programs that reproduce the 
coefficients yi/r. 

Choose r quadrature rules fof"*, 62 \ • ■ • : ^'i"* where I runs 
from 1 to r. Each bf^ leads to an estimate as in Section II, 
part A. 

Now, consider the contribution of each fi. The only contri- 
bution from /o occurs when fe^'-* = 0, so 

r 

yo = hJ2{l-b['^), 
The only contribution from /„ occurs when blP = 1, so 

y„ = hj2bi[>- 

The contribution from fi, where i is neither 1 nor n occurs 
when ~ (left endpoint) or when b''P = 1, so 

y.-h±(i-b^l, + bf\ 

1=1 
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Next, solve for the bl . From the /o coefficient, h^ ' = 
r — yo. Plug this into the relation for the /i coefficient, so yi — 
h ELi (1 - &o ^ + I implying that h E 6^'^ = 2r - yo - 
Continuing in this way shows that h E &f — ir — yo ~ yi — 

Vi 

Finally, y-n — bn\ but this is redundant since the /„-i 
coefficent satisfies /i^ ^i'-i ^ nr — yo — yi ~ ■ ■ ■ — j/n-i 
Since E ?/i = 1> the theorem is proved. ■ 

VI. Further Work 

One foresees two kinds of further work. The first involves 
the concept of integration. Suppose that one makes a random 
choice of binary string(s) to define a quadrature program: what 
is the probability that this program is good? The sample space 
here is well-defined, namely, binary strings, but the concept of 
"good" needs refinement, especially with regard to the space of 
functions to be integrated. Integrating smooth functions allows 
one to compare the results with Newton-Cotes quadrature, but 
seems excessively restrictive in terms of the applications in 
the introduction. Perhaps it would be better to survey, say, 
functions, by choosing random coefficient for a wavelet fj] 
basis. 

There is also further work possible from the perspectives 
of signal processing, compressed sensing, and cryptography. 
One way to think of f{x) dx is to think of / as a message 
and the integral as a message digest. From a cryptographic 
perspective, this is not a good message digest, because the 
information from the high order bits of the message has no 
effect on the low-order bits of the digest, while an ideal 
message digest should appear random. Can one characterize 
other message digests in terms the information content added 
by the algorithm? Is this a measure of security? 

From the signal processing perspective, the function / rep- 
resents some signal and the integral is a simple form of on-line 
processing. It is a simple matter to integrate against a kernel 
K{t), that is, to estimate J K{t)f{t) dt, as long as onehas 
enough information about K. But what of more complex 
processes like convolution? These problems are particularly 
interesting in the context of compressed sensing: what is the 
information-theoretic meaning of an integral transform when 
the function / is compressible? 
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